Proteochemometric modeling in a Bayesian framework

نویسندگان

  • Isidro Cortes-Ciriano
  • Gerard J. P. van Westen
  • Eelke B. Lenselink
  • Daniel S. Murrell
  • Andreas Bender
  • Therese E. Malliavin
چکیده

Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with [Formula: see text] values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverse Problems in Imaging Systems and the General Bayesian Inversion Frawework

In this paper, first a great number of inverse problems which arise in instrumentation, in computer imaging systems and in computer vision are presented. Then a common general forward modeling for them is given and the corresponding inversion problem is presented. Then, after showing the inadequacy of the classical analytical and least square methods for these ill posed inverse problems, a Baye...

متن کامل

Modeling the Interaction Space of Biological Macromolecules: A Proteochemometric Approach : Applications for Drug Discovery and Development

Cover image: A three-dimensional representation of the interaction universe of biological macromolecules and low molecular weight compounds. Each red sphere represents a chemical sub-space of macromolecules and each gray sphere corresponds to a chemical sub-space of ligands. Lines indicate multiple macromolecule-ligand interactions. Proteochemometric models describe interaction universe mathema...

متن کامل

A New Acceptance Sampling Design Using Bayesian Modeling and Backwards Induction

In acceptance sampling plans, the decisions on either accepting or rejecting a specific batch is still a challenging problem. In order to provide a desired level of protection for customers as well as manufacturers, in this paper, a new acceptance sampling design is proposed to accept or reject a batch based on Bayesian modeling to update the distribution function of the percentage of nonconfor...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Proteochemometric Modeling of the Antigen-Antibody Interaction: New Fingerprints for Antigen, Antibody and Epitope-Paratope Interaction

Despite the high specificity between antigen and antibody binding, similar epitopes can be recognized or cross-neutralized by paratopes of antibody with different binding affinities. How to accurately characterize this slight variation which may or may not change the antigen-antibody binding affinity is a key issue in this area. In this report, by combining cylinder model with shell structure m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2014